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ABSTRACT 


The empirical power of a new multivariate goodness-of- 
fit test proposed by Foutz (1980) is investigated. The 
test has been applied to Monte Carlo samples from bivariate 
and trivariate normal distributions with a variety of mean 
vectors and covariance matrices. The null hypothesis 
tested is that the sample is from a multivariate normal 
Pioen bution wien O mean vector anc covariance matrix the 
fiemaity L. The eserved number of rejections in 5000 
replications is used as the measure of effectiveness of 
the test. The results indicate that the Foutz test is 
quite capable of detecting mean and variance shifts but 


1s not as powerful against covariance shifts. 
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Mimsecdelselecaleantealysic, Choosing the correct distri- 
bution to model available data is of importance. A class 
of procedures known as goodness-of-fit tests has been 
derived to test the hypothesis that a set of samples is 
from a given distribution. Many of these tests are 
readily available and are well known, such as the Chi- 
Square or the Kolmogorov-Smirnoff (K.S.) goodness-of-fit 
test. These tests were designed for univariate distri- 
butions and are not usable as multivariate goodness-of-fit 
tests in their present form. 

In 1980 Robert V. Foutz [Ref. 1] proposed a new multi- 
variate goodness-of-fit test that will be called the Fn 
test in the sequel. In analogy to the K.S. test the Fn 
test compares a hypothesized cumulative distribution func- 
meen (CDF) with a “continuous empirical distribution 
function" (CEDF) formed from sampled data. Foutz found 
the null distribution of the test to be distribution free 
as well as being independent of the number of variates p. 

Foutz obtained an integral expression for the null 
Seeouribution of the fn test statistic, and closed form 
solutions for sample size 2 or 3 were provided. The 
complexity of the integral expression increases with 
sample size, and a normal approximation to the null distri- 


bution was given for use with larger sample sizes. Although 





the Fn test was designed as a multivariate goodness-of-fit 
test it can also be used to fit univariate distributions. 
Franke and Jayachandran [Ref. 2] compared the empirical 
power of the Fn test with that for the Chi-square test and 
the K.S. test. The results indicated that the Fn test 
competes well with these other tests. 

The power of the Fn test as a multivariate goodness-of- 
fit test 1s investigated in this thesis. A description of 
the Foutz test 1S given in Section II and the Monte Carlo 
methods of simulation are presented in Section III. The 
results and conclusions are in Section IV. A Fortran code 
for the application of the Fn test is available in the 


Appendix. 





Mere nouns LEST 


The Fn test for multivariate goodness-of-fit is based on 
a comparison of a hypothesized CDF with a continuous empiri-~ 
cal distribution function (CEDF) derived from a sample. The 
first step in the determination of the CEDF is the construction 
of what are known as statistically equivalent blocks. A 
general method for determining statistically equivalent blocks, 
due to Anderson [Ref. 3], is described below. 

Given a random sample Ky Bor- e+ Xp _y from a p-variate con- 
tinuous distribution, select n-1l functions Ay (X) | age as a CaP 
n-l, not necessarily distinct, such that each hy (X) has a 
continuous distribution. These functions are referred to as 
cutting functions and will be used to partition the sample 
Space into blocks. Let Ky rKoreee Ky be a permutation of 


Mee, ...,n-l. Order the X,'s accordance EO Ay Pom ands de fane 
1 
X(k)) as the k,th order statistic. The sample space is par- 


meeroneda into two blocks. 


[> 
yr 
= 
KN 
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nw 
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}- 


[~*~ 
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ee 
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Sr 
wae 
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At the second step if 0 < Kk, < Kk, the k-l X's in B, 


ie (x)? X(k.) is defined as the kth lye! 
= “ 
the ordering. Define a cut on By obtaining 3 blocks as follows: 


abe 


@e@eered according to h 
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Bi, = 8, 97) X BR, (XY) sh (X(KS))G 
| 2 2 
Bi> = By n | DRE 1% > My S%2) | ; 
B59 = B.- 
Now consider the other alternative, kK 2 ki: We rank the 
moo 1) ~k, ) koceineenermsceond olock Ba accoraing to Ay (X) 


2 
and let X(k.) be the (ko-k,) th largest in the ranking. De- 


Pemrng a cut at Ay (X(k.,)) we obtain the 3 blocks, 


2 

= (0 aan ean 

Boy = Ba, n | ole (x) a Ay (X(k.)) ; 
rr 

Ba = B. 0 | X: — > Peg | : 


Maes process 1s continued until all the cutting functions are 
exhausted. This results ina partition of the sample space 
into n statistically equivalent blocks, which are denoted by 
fmt = 1,...,Nn. 

in the univariate case an intuitively appealing choice for 
M@@emcutting functions is the identity function viz., h(X) = X 
for all k. The resulting statistically equivalent blocks are 
mem (=@,X(1)], X(1),X(2)] ,...,(X(n-1) ,+°) where X(j) is the 


jth order statistic. The multivariate analogue is to choose 


dk 





meaividual Coordinates as cutting functions, viz., hy (X) = x), 
fe jt coordinate of X. An example illustrating the con- 
struction of the blocks in the bivariate case is given below 


for a sample of size 8. 


met (2,4,60,¢6,1,3,5,/) be the permutation vector K. Define 


is 
| 


= x), ice test seOOrGrnabemor x, [Or kK = 2,4,6,8 and 
h,(X) = x62), Edewseeomd scOonearnace, £Or K = 1,3,5,/. Figure 1 
gives a graphical representation of the rectangular coordinate 
method of forming blocks and Figure 2 is the representation 


for the polar coordinate method. The random sample that was 


Meea in both figures is found in Table I. 


TABLE I: SAMPLE BIVARIATE DATA 


N = 8 
Observation il 2 ' 4 > 6 7 8 
Coordinate 
ik aoe eee tod. 67 6 62.00 = .75) —-2.25 . 0.00 
2 Ge 00m=2).25 Oreo ee OO mel. 2a =—)o50,. =—1.50 -0.50 
The first element of the permutation vector is k = 2 and 


he (X) = Po therefore Se is defined to be the second 
smallest first coordinate. This partitions the sample space 


into two blocks, 


B - Nea Nie < | : 
ib | — “2 | 

= _ a og ab) ss Rene 
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Slat iotieGAbhy EQULVALENT BLOCKS-- 
POLAR COORDINATES 
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The second element of the permutation vector is K5 = 4, 


hy (X) = x'4) and K, > Ki: Hence the block B. is partitioned 


into two sub-blocks, 


. (Gb (1) 
Bo, = Bo 0 | X: X < X, 
= hee) a) | 
where is the second largest coordinate among the X's in 
block B.. At this stage the sample space is partitioned into 


three blocks. Next, the third element of the permutation 
vector and the corresponding cutting function define another 
partition of one of the three blocks into two sub-blocks. 
This process is continued until the permutation vector is 
exhausted, at which stage the sample space will be partitioned 
into 9 statistically equivalent blocks. 

The CEDF is now constructed by spreading a mass l1/n within 
each block. If H, is the hypothesized CDF and A the CEDF, 


0 
the test statistic Fn takes the form 


Fn = sit JH (xX) - Hy (X) |. (a) 


Let D., Pe-el,2,.2...,0, be the probability contents of the 


blocks B under the null hypothesis H 


g7 i-e-, Dy = r, GH) (xX) - 


al 
Pmcomoutational form of the Foutz test statistic is, 


JS 





2 i 
Fn = } Max ‘Opa si Dae (2) 


Foutz gave the following representation for the cumulative 


Seestribution of the test statistic 


P(Fn< x) = r) ene Es Gy (54 1S are eer 513) dd, a5 54-- +1 AO a 
(3) 
wer 
oe yee) = mi tn-l): 
EO 
eo oo) 5 tite ee eer 
n Ih Ze Giulio ed: 


The evaluation of this intecral is cumbersome and has not been 
carried out for n> 5. Foutz has therefore derived a large 


sample normal approximation given by 


2 


(EN ore 
=] 


Lim P[Fn<x] = 6[ ~ }. (4) 
= ena 


n-o ( 


> 


To check the accuracy of the normal approximation, Franke 
and Jayachandran [Ref. 4] generated 80,000 samples of sizes 


20, 30 and 50. Table II contains the empirical significance 
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Pao ets SEMPERECAL SIGNIFICANCE LEVEL OF THE FOUTZ Fn TEST 


Sample Size 20 30 90 
Normal 
Baeogniticance 
Level 
0 Om 7 .0800 .0859 
5 OS) moo 72 .0399 .0428 
a0 1 ONO wOOS 3 SOS 2 


levels, when the normal approximation was used to determine 
the critical values for the Fn test. 

It is clear that the rejection rates given in Table II 
are consistently lower than the nominal values. More accu- 
rate critical values were therefore determined from the 


80,000 Fn values and are presented in Table III. 


oot rhs) APPROXIMATE CRITICAL VALUES FOR Fn TEST 


Sample Size 20 30 >0 


Significance Level 


oO ~42714 42 o 0S -40816 
(nS 59 6) (42383) (241150) 
05 -44865 BS) Soe. oe 2 
(7545513) to 269 ) (.42386) 
mo. ~48659 Reo) Es, ~44487 
(eacee 7 ) (.46944) (.44706) 


Values in parentheses are those obtained from the normal 
approximation given by Foutz. 


Ay 





tiie Ese Ree TION OF THE SIMULATION 


Mn Order to check the efficacy of the Foutz test as a 
multivariate goodness-of-fit test a simulation was run to 
generate sample data from various bivariate and trivariate 
normal distributions. The hypothesis tested in each case 
is that the sample is from a multivariate normal distribu- 
mom With mean vector 0 and covariance matrix the identity 
fee nectangular and the polar/spherical method of blocking 
were both used and compared as to their effect in each 
case. 

To validate the blocking schemes, the null hypothesis 
is tested against data generated from the distribution 
[owe Bivariate and trivariate sample sizes of 20, 30 
and 50 are used to compute the Fn statistic which is then 
compared to the empirical critical levels found in Table III. 
Rejection rates are based on the number of rejections in 
20,000 replications for each sample size. Comparing the 
null rejection rates to the nominal significance level 
used, aS shown in Table III, provides evidence supporting 
both blocking methods as all null rejection rates are close 
to the significance level used. 

The empirical power of the test was then investigated 
Byevarying the distribution tested. This investigation is 


accomplished in three different ways. First, the mean is 


18 


Petced away Loom the 0 vector while leaving the covariance as 
the identity matrix. This is done to investigate the ability 
of the test to detect location shifts. The covariance matrix 
is then changed from the identity while leaving the mean as 
the 0 vector. This is accomplished by changing the diagonal 
elements alone to investigate variance shifts and then shift- 
ing the off diagonal elements by themselves to check the effect 
of covariance shifts. A primary sample size of 20 was chosen 
for comparison and 5000 replications were used to compute 
rejection rates for each distribution tested. Mixing of the 
three types of shifts is also simulated to investigate the 
possible confounding effects of the three shifts. Finally 
sample sizes of 30 and 50 are run on a few of the distribu- 
tions to determine the effect of increasing the sample size. 
The various multivariate normal distributions are simulated 
in the following manner. Univariate normal(0,1) pseudorandom 
deviates are obtained from the LLRANDU series by Lewis [Ref. 5] 
BeeegcOuped tO form a multivariate N(0,I) p-variate vector. 
Taking the x SS Pore Ene Povamlace Ni(O,T) vector random 


variable is transformed by 


where 


We, 





Pa eeing wi neamex whitch 1s distributed as N(u,2). The Foutz 
test is then applied to each of the samples consisting of 
aol) Xs. 

An example using a bivariate sample helps illustrate the 
blocking procedure used. Let X_,X see Xi ye be the simu- 


= ho 
ey 
a 


feed bivariate sample. The first cut is made on X OE 


mime tLrst coordinate of the first vector Xi: Two blocks are 


formed, 


Barst Second 
Coordinate Coordinate 
Bq oa (Xy Rie ) ( ,?T ) a 


X. is taken next and determined to be contained in block By 


or B.- Suppose xX, HS) ig le lerey' B.- B, is then partitioned 


(2) 
Z 


are now defined as, 


by X, °. or the second coordinate of sample X,, Three blocks 


First Second 
Cocmeanace Coordinate 

1 = (0, X07] (=, +2) 

Bop (x17) | 400) (ac, #2) 
Bog = (Ky st) (x2), 40) 
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This procedure is continued by examining the next vector 
in the random sample, locating the block that it is contained 
in and partitioning the block by the designated coordinate. 
The coordinate cutting functions used are alternated starting 
with the first coordinate for the first cut. Coordinate 
ranges, as shown, are used to designate blocks and the vrocess 
1s continued until n blocks are so defined. Given any random 
sample this method can be shown to be equivalent to a unique 
permutation vector K and a set of cutting functions {hy} as 
weLt~mea in Section II. 

After the formation of the statistically equivalent blocks, 
each block has the probability content of 1/n and must be 


compared to the hypothesized content using the statistic 


+ 1 

Fn = ) max({0,=-D,]. (2) 
| ee 
i=l 


Ds, the probability content of each block, under the null 
hypothesis, is defined by the integral of the null density 
Sem the block. The integral of the multivariate normal (0,1) 


over a rectangular block yields 


-p 
rs es 


DD. = f ere I (27) = dx. (3) 


Zan 





This reduces to the product of the marginal densities which 
may be easily evaluated with many available routines, elimi- 
nating the need for numerical integration. 


In spherical coordinates D; is represented by 


72 22g 72) (41/2) 0° 2 
D. = f f een e sin()p~ dodedo. 


5 2s | 
ee, 
Upon separation, 
b : a ee 
D. = ‘i (ye) sin ¢ do f (2impe Ge <e— ip : 
01 ey Py (C257) 
(5) 


Noting that with a change of variables the third integrand 
is a Chi-square density with 3 degrees of freedom, we may 


use a closed form expression to evaluate D as follows: 


= = [5 (cos d5- COS b4) 1x [5585-31 Ix Ig ge (05) xg 46 (2) 
(6) 
where 
X3ae (Py) = Pixgae<0,1, i lee 


For bivariate data the use of polar coordinates leads to 


Similar simplification leaving D. in tne form 


Ze 





il 2 2 


After the calculation of the probability contents D. wong 
the n blocks, equation (2) is used to evaluate the Fn statis- 
tic for each generated sample. The statistic is then compared 
to the critical values found in Table III to decide if the 
null hypothesis is accepted or rejected. Rejection rates 
are defined by the number of rejections divided by the number 
of replications ina given run. The rejection rates thereby 
define an empirical power for the simulated distribution. 

The major component of the Fortran simulation program 
used to evaluate the Foutz statistic for a given sample is 
available in the Appendix. It has been adapted for use for 
sample sizes up to 50, with redimensioning being needed for 
larger sample sizes. The program is applicable for fitting 
data from any hypothesized multivariate normal distribution 
and provides the Fn statistic as computed by both blocking 
methods presented. The code is self-contained except for 
three IMSL routines, LUDECP, MDNOR, and MDCH [Ref. 6]. These 
subroutines provide matrix decomposition, univariate normal 
probabilities and chi-square probabilities, respectively, 
and must be available or substituted prior to utilization 


Seeene program. 
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Dy enews AND CONCLUSIONS 


The results of the simulation are summarized in 
Tables IV-XIV. Rejection rates are given by the distribu- 
tion tested and the significance level used. Empirical 
power curves are presented in Figures 3-8. Rejection rates 
are plotted against the magnitude of the shift in mean, 
Variance and covariance for the distribution tested. All 
power curves are based on 5000 replicated samples and 
were compared at the a = .05 significance level. 

The results for the case in which the distribution of 
the samples is the same as the hypothesized distribution 
meee (0,1) ane given in Tables IV and V. The rejection 
levels obtained are close to the nominal significance 
level for both blocking methods. No distinct pattern of 
Variation about the prescribed levels is discernible for 
either method, as expected. 

The rejection rates for mean shifts are given in Tables 
VI-VII and Figures 3-4. Shifts in the mean vector are 
detected well; a shift of one standard deviation ina 
Single coordinate resulted in a 60% rejection rate for 
bivariate or trivariate data. Greater shifts in mean led 
to even higher rejection rates. The rectangular method 
of blocking consistently gave about a 10% improvement over 


the polar/spherical method in detecting mean shifts. 
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Results for variance shifts are contained in Tables 
VIII and IX and the power curves are given in Figures 5 
and 6. The Foutz test did not detect small variance 
shifts very well but the performance of the test was far 
better for larger shifts or shifts in more than one coor- 
dinate. No one method of blocking performed better in all 
cases but in general the polar/spherical method seemed to 
Outperform the rectangular method for detecting variance 
Bai ifts. 

The results for changes in covariance are summarized 
in Tables X, XI and Figure 7. Covariance shifts are not 
detected well for either blocking method except for highly 
correlated data with the correlation coefficient equal to 
.9. The polar/spherical coordinate blocking method appeared 
to perform a little better than the rectangular coordinate 
method of blocking, but in general the Simulation revealed 
that the Fn test is not very powerful against covariance 
Saitts. 

The empirical power for combinations of shifts in mean 
and variance Or covariance are presented in Tables XII and 
XIII. Entries are based on an a = .05 significance level 
and are tabled by the mean vector and covariance matrix 
of the sample data. Entries farther down and to the right 
correspond to greater shifts in mean and variance/covariance 
and are generally larger, as is to be expected. There are 


no apparent confounding problems due to shifts in both 


LS 





parameters. The rectangular method of blocking, however, 
did outperform the polar/spherical method for most cases 
of multiple shifts. 

The results indicative of the effect of increasing the 
Sample size are summarized in Tables XIV and XV. Results 
for sample sizes of 20, 30, and 50 are given for some 
representative cases. The tables reveal higher rejection 
rates for larger sample sizes with increases being compa- 
rable for both blocking methods. 

This study was limited to the two and three variate 
normal distribution. There are many problems for further 
research. Of primary concern is the generation of percen- 
tage points of Fn for various values of n. The intracta- 
fieeiey Of the problem of obtaining the exact distribution 
requires an empirical approach to finding a correction to 
the asymptotic approximation given by Foutz. Since the 
use of coordinates as cutting functions worked well, the 
method should be tried for other distributions and higher 
dimensions. 

In conclusion, the Fn test is found to be a viable 
Option for testing goodness-of-fit of multivariate normal 
distributions. These encouraging empirical results indicate 
further study should be conducted to explore the potential 


M@emrn1s test for other distributions. 


26 





TABLE IV: NULL EMPIRICAL REJECTION LEVELS FOR 
THE BIVARIATE NORMAL DISTRIBUTION 


Significance Level Ol 2O> ree 
Blocking Method 


N = 20 
Rectangular .0098 .0488 .0940 
Polar -0096 0482 O28. 
N = 30 
Rectangular seed) Oa O 0944 
Polar WO) te 2 0454 0890 
N = 50 
Rectangular sO L20 .0498 nO 0 
Polar SOOKE 0484 09S 6 


BASED ON 20,000 REPLICATIONS 
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APPENDIX A 


USER REQUIREMENTS AND INPUT FORMAT FOR PROGRAM FOUTZ 


The use of 


requires the 


the Computer program contained in Appendix B 


Sample size, number of variates, applicable 


data and the Multivariate Normal distribution being tested 


as described 


Matcix. The 


by the mean vector and the varlance-covariance 


variables containing the required inputs as 


well as the required input format are as shown below. 


DESCRIPTION OF VARIABLES 


\Saapamme eo Sample size 

Mer ere rn rr re --- Number of Variables (2 or 3) 

SIGMA] ------------ Varlance-Covariance Matrix 
(MxM) 

Bl ewrrnrn mene ----- Mean Vector (Mxl1) 

XKerrerr cers -- Matrix of Sample Data (MxN) 


INPUT PP ORMAT 


N,M--------------- (2S) 

SS A (3F12.6) Input M Rows 
A a an GEEZ <6.) Input M Rows 
aaa (clrPeZ 6) Slinputet! Rows 


Input data 1s echoed in the output providing a check 


for correct entry of data as well as is the decomposition 


of SIGMAIL. 


The Fn statistic as computed by both methods 


Of blocking follows completing the output given for a 


Single run. 


An example run is given for Trivariate data 


of sample size 10. 
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SAMPLE TRIVARIATE RUN 
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APPENDIX B 
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JR=EN*®(T-1) 
00 1190 J=l1l,N 
JK=JQ+4J 
HOLD=A( JK) 
JI=JR+J 
A( JK) =-A(J1) 
110 AC(JI) =HCLD 
120 J=MC(K) 
IF (J—-K) 100,100,125 
125 KI=K-<-N 
00 130 [=l1,N 
KI=KI+N 
HOLD=A(KI) 
JIsKI-K+J 
ACKI)J=-A( JI) 
1390 A(JI) =KCLO 
GO TO 1900 
150 RETURN 
C END 
C00 SUBROUTINE TRANS 000080000 
c PURPOSE: TO TRANSFCRM OBSERVATIONS TC N(O,gI) 
C UNDER THE NULL HYPOTHESIS. LSES INPUT 
C VALUES OF Bl ANC THE MATRIX C FROM DECOMP 
: TO ee amon THE DATA ENTEREC USING, 
c x = C(X—Bl). 
B28 29 000000 OO a a al la 
SLBROUTINE TRANS(MsXTTsBleTRAN,C eXTTR) 
DIMENSICN B1lOMe1l) sXTTIMe 1) ep XTTRO Mel) gp TRAN(M,1) 9C(M_M) 
CALL SUB(XTTy»s@leXTTReMsel) 
CALL PRD(CyXTTRy TRAN MeMel) 
RETURN 
END 
z @eeeaee@eedss#eege@e@egee@~¢e@efe@eeP@geeeeseodPc@eeeeeeeseeeee@ee@eeeweeweewemeecmlUcWmCcUOCCecUcCCWceclUCOmUCcCOmUcU!OmUc OmUCUCF 
f SUBRCUTINE SUB 
C PURPCSE 
¢ SUBTRACT ONE MATRIX FROM ANOTHER TO 
3 FCRM RESULTANT MATRIX. 
c eeeeed?de?cedede¢eeoe#s?eg@ee?ee@e@eeeseedsckeee?ee@esrteeegaseeceseeseseeeeeeeweeletlUCcClCOWCOclUcUOWCUCUCOCUCcOW 8 @ et 
SUBROUTINE SUE(AsBeReNoM) 
DIMENSION A(1),801) R01) 
- CALCULATE AUMBER CF ELEMENTS 
C NM=N*#M 
: SUBTRACT MATRICES 
DO 10 I=1,NM 
LO R(IT)=ACT)—-8¢1) 
RETURN 
ENDO 
c 
: @eee@eese1ede?eedeced¢eced@eeseoeeeeedecetecoeaesesc$ge?ssesedetedeeeseeveeeseP@eeestee@eeee#see1ede0e8e08de0@ 
: SUBRCUT INE PRD 
‘@ PURPCSE 
C MULTIPLY TWO MATRICES TO FORM A 
5 RESULTANT MATRIX. 
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